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Abstract 

In the current Internet, there is no clean way for affected 
parties to react to poor forwarding performance: when a 
domain violates its Service Level Agreement (SLA) with 
a contractual partner, the partner must resort to ad-hoc 
probing-based monitoring to determine the existence and 
extent of the violation. Instead, we propose a new, system- 
atic approach to the problem of forwarding-performance 
verification. Our mechanism relies on voluntary report- 
ing, allowing each domain to disclose its loss and delay 
performance to its customers and peers. Most impor- 
tantly, it enables verifiable performance measurements, 
i.e., domains cannot abuse it to significantly exaggerate 
their performance. Finally, our mechanism is tunable, al- 
lowing each participating domain to determine how many 
resources to devote to it independently (i.e., without any 
inter-domain coordination), exposing a controllable trade- 
off between performance-verification quality and resource 
consumption. Our mechanism comes at the cost of de- 
ploying modest functionality at the participating domains' 
border routers; we show that it requires reasonable re- 
sources, well within modern network capabilities. 

1 Introduction 

The lack of a systematic method for estimating the perfor- 
mance of Internet service providers (ISPs) is a well known 
problem: when an ISP does not perform as expected, 
there is no clean way for the affected parties to detect 
the problem so they can debug it, ask for compensation 
if a Service-Level Agreement (SLA) has been violated, 
or simply learn from it (e.g., re-assess a peering agree- 
ment with an under-performing neighbor). This lack of 
information makes network debugging difficult and slow, 
even leading ISPs to deny their failures to their customers 
and peers, pointing fingers at one another. One could at- 
tribute this situation to the best-effort nature of the Inter- 
net which, by definition, provides no a-priori guarantees. 
Yet that is no reason not to expect useful, after-the-fact in- 
formation about ISP performance — actually, it makes per- 
fect sense to expect such information in a best-effort en- 
vironment like the Internet, where communication quality 
often relies on quick failure detection and on choosing the 
right providers and peers. 

Since ISPs offer no explicit interface for their cus- 
tomers and peers to verify their performance, the latter 
can only resort to probing tools like traceroute or other ac- 



tive measurements. Moreover, researchers have recently 
started to combine probing from multiple vantage points 
(e.g., PlanetLab nodes) to gain information about ISP per- 
formance that would not be accessible through simple 
probing 11411131 . This information is typically extracted 
from channels with a different purpose (e.g., ICMP traf- 
fic), because probing mechanisms are designed under the 
assumption that ISPs would never freely provide honest 
information about their performance. 

But what if ISPs were willing to export an explicit in- 
terface through which their performance can be queried? 
In this work, we ask the question, how should we design 
such an interface such that it provides accurate and ver- 
ifiable information, while it can be implemented using 
a reasonable, tunable amount of resources? On the one 
hand, we find this to be an interesting thought experiment. 
On the other hand, we identify two strong, albeit perhaps 
unintuitive, reasons why an ISP may willingly expose its 
performance problems to the outside world. 

First, ISPs often need to exchange performance infor- 
mation anyway with their customers and peers, in order 
to handle customer complaints. When a customer calls 
her ISP to complain that she cannot reach a certain des- 
tination, the ISP needs to know whether the problem lies 
in its own local network, the customer's network, the net- 
work of the peer that is handling traffic to that destination, 
or the destination's network — because each of these cases 
warrants a different response. Today, this information is 
acquired by ISP operators in a reactive, ad-hoc manner, 
which means that it takes time to resolve each complaint, 
potentially leaving customers dissatisfied. It makes sense 
that an ISP would prefer to collaborate with its customers 
and peers and willingly exchange troubleshooting reports 
with them, provided that it can trust these reports to be 
accurate and honest. 

Second, it makes sense that an ISP would prefer to 
report its own performance rather than have its perfor- 
mance evaluated by untrusted entities, through potentially 
inaccurate mechanisms. Probing or other edge-based 
"black-box" mechanisms typically run on coalitions of 
end-systems like PlanetLab; the ISP has no reason to trust 
these, and they can provide no guarantee for the accuracy 
of their measurements. If an ISP's performance is to be 
talked about anyway, an accurate, trusted self-reporting 
mechanism may be preferable to the ISP, because, at least, 
it provides the ISP with control over the quality and quan- 
tity of the information that is revealed about its business. 



Self-reporting is not necessarily better or worse than 
edge-based probing; each approach has different pros and 
cons. On the one hand, while probing is effective for 
localizing persistent outages or high-rate drop patterns, 
it provides no reliable indicator of the fate of non-probe 
traffic: probes can be treated differently, either by de- 
sign (e.g., ICMP packet responses are generated off the 
fast path of routers), or by "strategic thinking" (treating 
probe packets preferentially to improve externally per- 
ceived performance). On the other hand, whereas probing 
is simple and requires no changes in ISPs, a self-reporting 
mechanism by necessity requires some extra complexity 
in the control- and data-plane mechanisms of the Inter- 
net's forwarding fabric. 

In the rest of the paper, we describe a self-reporting 
mechanism for verifiable network-performance measure- 
ments (or VPM, for brevity). According to VPM, each 
ISP's loss and delay performance is cooperatively esti- 
mated by the ISP itself and the other network domains 
(customers and peers) that carry its traffic. Its key fea- 
tures are: (1) It enables accurate estimation of ISP perfor- 
mance, without revealing any information about the in- 
ternal structure or routing policies of ISPs beyond what 
is already publicly available through BGP routing tables. 
(2) ISPs cannot abuse it to significantly exaggerate their 
performance. (3) It allows each ISP to choose its own 
cost/quality trade-off independently from others, yet in 
a way that does not compromise the verifiability of the 
derived measurements. These features come at the cost 
of deploying new functionality at the participating do- 
mains' border routers, but we show that the correspond- 
ing memory, processing, and bandwidth requirements are 
well within the capabilities of modern networks. 

We start, in Section |2] with a high-level description of 
our approach, followed by a more precise problem state- 
ment and our assumptions. Section [3] explains why ex- 
isting protocols or straightforward combinations of ex- 
isting techniques fail to provide an appropriate solution. 
Section |4] describes what kind of information VPM col- 
lects and disseminates among participating domains. Sec- 
tions |5] and |6] describe how VPM provides independent 
tunability of resource expenditure at different domains 
while still achieving high quality of information. Sec- 
tion [T] evaluates VPM experimentally and through back- 
of-the-envelope calculations, in terms of its overhead and 
information quality provided. Section [8] discusses partial 
deployment and related work, and Section|9]concludes. 

2 Setup 

In this section, we first describe our approach at a high 
level 02.1I ). then provide a more concrete problem state- 
ment 02.2I ) and state our assumptions 02.31 1. 

We will use the following terminology. A "domain" 
is a contiguous network that falls under one administra- 



HOP path. 




Figure 1 : Circles represent administrative domains. The num- 
bered boxes represent HOPs. The black arrow represents a HOP 
path. Our main example scenario throughout the paper: domain 
S sends to domain D a packet set 5 — {pi,P2, ■■■} via HOPs 1 
to 8. 

tive entity; in the current Internet, a domain would re- 
fer to an edge network or a single Autonomous System 
(AS). Each domain has hand-off points (or HOP s) along 
its perimeter; these are ingress/egress points, where traf- 
fic enters/exits the domain's jurisdiction (see Figure[T]for 
examples). Each HOP is connected to a neighboring do- 
main's HOP through an inter-domain link; such a link is 
considered/aM/fy when it introduces loss or delay beyond 
a known specification. We are in particular interested in 
packets traversing the same HOP path, i.e., the same se- 
quence of HOPs; we name such paths according to their 
source and destination routing prefixes (that is, origin pre- 
fixes as advertised in BGP). 

2.1 Approach 

In VPM, each domain monitors traffic at its HOPs and 
produces receipts for the traffic that enters and exits its 
network. For privacy reasons, a receipt is made avail- 
able only to the domains that observed the corresponding 
traffic. For instance, if any of the domains in Figure [T] 
produces a receipt for a set of packets {pi,p2, ■■■} that 
crossed domains S, L, X, N, and D, the receipt is made 
available only to these particular 5 domains. To ensure 
this, each HOP classifies observed traffic per HOP path 
and produces a common receipt only for packets that fol- 
lowed the same HOP path. This implies that when a HOP 
observes two packets pi and p2, the HOP knows (in prac- 
tice, can guess with a high probability) whether the two 
packets belong to the same HOP path (see Assumption #1 
below). 

Each domain X collects receipts from its neighbors 
with the purpose of estimating each neighbor's loss and 
delay performance with respect to its traffic. Moreover, 
domain X collects receipts from the other domains that 
observed its traffic with the purpose of verifying the cor- 
rectness of its neighbors' receipts. The idea is that if a 
neighbor provides incorrect receipts to exaggerate its own 



performance (e.g., claim that it delivered traffic that it ac- 
tually dropped), these "dishonest" receipts will be incon- 
sistent with the receipts of the other domains on the path. 

We do not worry, in this paper, about how or when re- 
ceipts are disseminated (see Assumption #2 below). A 
domain could request receipts periodically (e.g., once an 
hour or once a day) or arrange to receive them in real time, 
as they are generated. Collecting receipts from all other 
domains that handle a domain's traffic may sound like 
overkill at first — and it would be, if receipts were pro- 
duced per packet or per flow. However, in VPM, receipts 
are produced at coarser granularity, such that each domain 
incurs, due to receipts, less than 0.1% overhead over the 
traffic it observes (©. 

Instead, we focus on the content of the receipts. We 
ask the question, if domains were willing to provide re- 
ceipts on the traffic they receive and deliver, what should 
these receipts consist of, such that (i) they can be gener- 
ated using a reasonable, tunable amount of resources and 
(ii) neighbors can use them to estimate and verify each 
other's performance? 

Threat Model We assume the existence of both honest 
domains that construct their receipts exactly as our pro- 
tocol specifies and lying domains that construct their re- 
ceipts using incomplete or fabricated information. Our 
threat model allows lying domains to collude with others 
towards a common nefarious goal. Nevertheless, a lying 
domain can observe only network traffic that appears lo- 
cally (because it originates at, terminates at, or transits 
that domain), or that has been observed by its colluding 
domains. 

We do not consider, in this paper, the scenario where 
domains modify observed traffic. This is not because this 
scenario is not plausible or not interesting, but because 
it is, to the best of our knowledge, further from current 
ISP practices (than introducing loss or unpredictable de- 
lay and denying performance problems). Moreover, as we 
will see, dealing with loss and delay without considering 
traffic modification is already a challenging enough prob- 
lem to warrant separate treatment. 

2.2 Problem Statement 

Consider a path P, like the one pictured in Figure [T] Sup- 
pose that each HOP in V can disseminate a certain amount 
of information to all other HOPs in P. The question is, 
what should this information be, such that the following 
conditions are met: 

1. Computability As long as domain X in path P pro- 
duces honest information, X's neighbors in P can use that 
information to compute the loss and delay introduced by 
X in the traffic flowing along P. 

Regarding delay, we are interested in delay quantiles, 
e.g., domain L should be able to determine that domain X 



introduced delay below 5msec to 90% of the traffic with 
a certain (high) probability tt. We are interested in quan- 
tiles, not delay averages, because a domain may exhibit 
low average delay at the time scale of seconds or min- 
utes, yet introduce "spikes" of high delay that can impact 
the performance of TCP or real-time applications signifi- 
cantly iflTl . 

2. Veriflability If domain X in path P produces dishon- 
est information, its neighbors in P can detect and discard 
that information. 

3. Tunability The amount of resources consumed in 
collecting and disseminating information is locally tun- 
able by each HOP, such that the accuracy of the statistics 
computed from this information degrades gracefully with 
the amount of resources spent to collect and disseminate 
it. 

2.3 Assumptions 

In addition to our threat model, we make the following 
assumptions: 

(1) Our strongest assumption is that the HOP path over 
which traffic between the same source and destination ori- 
gin prefix is routed changes only slowly (i.e., on the order 
of hours, rather than seconds). This is largely the case 
today for domain-level paths over short time scales. Note 
that this does not restrict how a domain load-balances traf- 
fic internally. Each domain is free to split traffic through 
multiple internal paths in any way it wants, as long as it 
forwards all traffic with the same source/destination pre- 
fixes via the same egress link. 

(2) We assume that there exists a way for a domain in 
path P to disseminate receipts to all other domains in P, 
such that the authenticity and integrity of each received re- 
ceipt is guaranteed. One way of realizing this assumption 
would be for each domain to make its receipts available at 
an administrative web-site and accessible over HTTPS. It 
is possible to design more efficient dissemination mecha- 
nisms, but that is outside the scope of this paper 

(3) Finally, we assume that each domain has some 
network equipment (routers or other middleboxes) that 
can perform at wire speed simple per-packet operations. 
Those include packet timestamp generation, arithmetic 
calculations or digest computations on packet headers and 
a small portion of packet pay load, and modification of lo- 
cal state in a buffer. This assumption is well justified by 
current trends in production routers, as well as the increas- 
ing focus of academia and industry on programmable 
routers and switches ifTSl lTl. 

3 Why a New Protocol 

There already exist many good techniques for measuring 
network performance ||8] |20l [12] [15]. So, instead of de- 
scribing VPM from scratch, we first build, in this section, 



"obvious" solutions by combining or extending existing 
techniques, and describe why each of these solutions fails 
to meet the three conditions of our problem statement. We 
close with an overview of VPM and how it relates to the 
existing techniques. 

3.1 Strawman 

As a first-cut, strawman solution, we consider the fol- 
lowing modest extension to the Packet Obituaries pro- 
tocol 13]: Each HOP produces a receipt for every sin- 
gle packet it observes. A receipt consists of a digest for 
the corresponding packet and the timestamp for when the 
packet was observed. Each receipt is made available to all 
the domains that observed the packet. 

Computability The strawman easily meets this condi- 
tion, as a receipt collector in possession of all the (honest) 
receipts generated by a domain X can determine whether 
each packet that entered X was dropped within X and, if 
not, by how much it was delayed within X. By combining 
such information for multiple packets, the receipt collec- 
tor can easily compute aggregate loss statistics and delay 
quantiles for X. 

Veriflability The strawman also meets this condition: 

To hide a loss or delay incident, a domain has to falsely 
put the blame for the incident on one of its neighbors, 
which results in inconsistent claims between the two do- 
mains. For instance, suppose domain X receives packet 
p from domain L but drops it before delivering it no do- 
main N . If X is dishonest and wants to hide the fact that 
it dropped p, it can put the blame on N, i.e., falsely claim 
having delivered p to N . This claim will be inconsistent 
with A^'s claim of not having received p. 

Such an inconsistency can be due either to a lie or to 
a faulty inter-domain link. If a receipt collector receives 
inconsistent claims from two neighbors, it discards the 
corresponding receipts (from both neighbors) and noti- 
fies both of them of the inconsistency. The two involved 
neighbors can then debug their inter-domain link; if it is 
functioning correctly, then the inconsistency was due to 
a lie, and the lying domain is exposed to the neighbor it 
implicated. For instance, if X falsely reports having de- 
livered packet p to N, but N correctly reports not having 
received p, the rest of the world cannot determine whether 
X or iV is lying, but N does know that X is the liar 

A domain can always support a lying neighbor's claims, 
but then it either has to take itself the blame for the liar's 
loss/delay or falsely accuse another domain down the 
path. For instance, if X falsely claims having delivered 
pto N, N has the option of covering X's lie (by claiming 
that it indeed received p), but then it has to claim either 
that it lost p itself, or that it delivered p to D — in which 
case N is exposed to £> as a liar. 

It is important to note that the strawman meets the ver- 



ifiability constraint, only because each receipt collector 
collects receipts from all HOPs on the path, and computes 
the performance of all domains. If, instead, each receipt 
collector collected receipts only from a segment of the 
path, then there would be no incentive for domains to be 
honest about their neighbors' performance. For instance, 
suppose domain L wants to compute domain X's perfor- 
mance but collects receipts only from HOPs 3, 4, and 5. 
Suppose domain X drops packet p and falsely claims hav- 
ing delivered p to N. In this case, N can safely cover X's 
lie, i.e., claim having received p. Since domain L does not 
collect receipts beyond HOP 5, it has no way of comput- 
ing A^'s performance and verifying it against D's receipts. 
Hence, N can collude with X and cover its lie without any 
harm to its own reputation. 

Tunability This is where the strawman fails. The cost of 
maintaining and propagating per-packet receipts, though 
not intractable, can be expensive in buffering space, pro- 
cessing, and reporting bandwidth. Different domains may 
have different resources they are willing to devote to 
a self-reporting endeavor, and keeping per-packet state 
leaves no room for tuning. 

3.2 Trajectory Sampling ++ 

Since the main problem with the strawman is the non- 
tunable cost of collecting and exchanging per-packet state, 
the first solution that comes to mind is to sample, i.e., col- 
lect information not on all packets, but on a representative 
subset, and use it to infer statistics for the rest. Hence, we 
next consider a combination of the strawman and Trajec- 
tory Sampling IH (we call it "Trajectory Sampling ++"). 
Each HOP applies a uniform hash function to a small, 
fixed portion of each observed packet. If the outcome ex- 
ceeds a pre-configured threshold, then the packet is sam- 
pled and the HOP produces a receipt for it. Each pair of 
HOPs from the same domain use the same hash function 
and sampling threshold, hence sample the same packets. 
Each receipt is made available to all the domains that ob- 
served the corresponding packet. 

Computability This condition is met, both for loss and 
delay statistics. First, a receipt collector in possession 
of all the (honest) receipts produced by a domain X can 
count how many of the sampled packets were lost within 
X; from that, it can estimate how many packets were lost 
within X overall, as shown in fSOl. Similarly, the receipt 
collector can compute the delay incurred by each sam- 
pled packet within X, then estimate delay quantiles for 
the overall traffic 



Veriflability This is where Trajectory Sampling ++ 
fails, and we will argue that this failure is inherent to any 
sampling-based solution. 

The obvious problem with sampling is that a domain 
can he about its performance by biasing the sampHng pro- 



cess. Since a domain's performance is estimated based on 
how it treats the sampled packets, if domain X treats the 
sampled packets preferentially (i.e., assigns them to high- 
priority queues), then X's estimated performance will be 
higher than its actual performance. 

On a first thought, such cheating seems easy to detect, 
as long as not all HOPs sample the same packets. We il- 
lustrate with an example. Suppose HOPs 4 and 5 from 
Figure [U sample one set of packets, si, whereas HOPs 3 
and 6 sample a different set of packets, S2- Suppose do- 
main L wants to estimate domain X's performance and 
collects receipts from all HOPs. First, L uses the receipts 
from HOPs 4 and 5 to estimate the loss and delay in- 
curred between these two HOPs. Similarly, L uses the 
receipts from HOPs 3 and 6 to estimate the loss and delay 
incurred between them. If the two sets of statistics do not 
match (e.g., the estimated loss between HOPs 4 and 5 is 
significantly lower than the estimated loss between HOPs 
3 and 6), then: either one or both of the involved inter- 
domain links are malfunctioning, or domain X is biasing 
its samples to exaggerate its performance, or domain N 
is biasing its samples to misrepresent X's performance. 
Hence, one could argue, as long as not all HOPs sample 
the same packets (hence, not all HOPs have a reason to 
bias the same traffic), we can get similar incentives with 
the strawman, i.e., lies lead to inconsistencies, and liars 
are exposed to their neighbors. 

The main problem with this argument is that it assumes 
that domain X (i.e., HOPs 4 and 5) treats the packets 
from set si preferentially, but the packets from S2 nor- 
mally (like the rest of the traffic); yet there is a clear incen- 
tive here for domains X and N to collude and treat both 
sets of sampled packets preferentially, such that they make 
consistent claims, and the statistics computed from their 
receipts overestimate the performance of both of them. 
There are also other problems, less fundamental, but po- 
tentially significant in practice: This approach requires 
HOPs from different domains (in our example, HOPs 3 
and 6) to agree to sample the same packets. Moreover, 
an "inconsistency" is now a difference in a probabilistic 
estimate — not a concrete disagreement about a particular 
packet as in the strawman. 

To conclude, when each domain's performance is es- 
timated based on how it treats sampled packets, then a 
sequence of interconnected domains have an incentive to 
collude and bias the samples taken by all of them. In 
contrast, when domains provide receipts for every single 
packet, there is no incentive for such misbehavior, because 
colluding with a neighbor to cover the neighbor's failures 
necessarily means taking the blame yourself. 

An explanation of why the "Secure Sampling" tech- 
nique from fTT\ does not address this problem can be 
found in Section[8] 



3.3 Difference Aggregator ++ 

An alternative way of introducing tunability in the straw- 
man is to aggregate, i.e., collect information not for in- 
dividual packets, but for groups of packets. The bene- 
fit of aggregation versus sampling is that each domain 
produces information that depends on all the packets it 
observes, hence there is no straightforward way to cheat 
by treating certain packets preferentially. Hence, we next 
consider the following combination of the strawman and 
Lossy Difference Aggregator ifTSll (we call it "Difference 
Aggregator -H-"). 

Each HOP breaks the sequence of observed pack- 
ets from a given path into packet aggregates, where a 
"packet aggregate" is a set of consecutively observed 
packets. For example, if a HOP observes packet sequence 
{pi,P2,P3,P4, P5 jj from path V, it may break that into 
two aggregates {pi, P2, Pa} and {p4, ^5}. For each aggre- 
gate, the HOP computes a packet count and an average 
timestamp, and stores them in a receipt, together with an 
identifier for the aggregate. Each receipt is made available 
to all domains that observed the corresponding aggregate. 

Moreover, each pair of HOPs from the same domain 
try to break the observed traffic into the same set of ag- 
gregates. A classic approach is to use common "cutting 
points": Each HOP applies a uniform hash function to a 
small, fixed portion of each observed packet. If the out- 
come is larger than a pre-configured threshold, then the 
packet is considered a "cutting point" and starts a new 
packet aggregate. If two HOPs use the same hash function 
and cutting threshold, and there is no packet re-ordering 
between them, then the two HOPs end up breaking the 
observed traffic into the same set of packet aggregates. 

Computability Difference Aggregator ++ fails to meet 
the computability condition in two ways. First, it can- 
not provide meaningful statistics in the face of packet re- 
ordering. Second, even if there is no packet reordering, it 
cannot provide sufficient information for estimating delay 
quantiles — only for computing loss and estimating aver- 
age delay. 

Let's assume, temporarily, that there is no packet re- 
ordering within domain X. In this case, a receipt collec- 
tor in possession of the (honest) receipts produced by X 
can compute the loss incurred by each packet aggregate 
a within X, by comparing the packet counts collected for 
a at HOPs 4 and 5. By combining such information for 
multiple aggregates, one can precisely compute the loss 
incurred by the overall traffic within X. Less obviously, 
by taking into account only the aggregates that did not in- 
cur any packet loss, one can estimate the average delay 



' We could have equally considered a combination of the strawman 
and the "Secure Sketch" technique from |12| . The conclusion would 
have been the same. For a comparison with that work, see Section[8] 

^In reality, a HOP would observe infinite packet sequences. In our 
examples, we use finite sequences for simplicity. 



incurred by the overall traffic within X ifTSl . 

On the other hand, there isn't sufficient information for 
computing delay quantiles for domain X, i.e., we cannot 
make statements of the form "90% of the packets incurred 
delay below 10msec within XT The only technique that 
we are aware of for computing delay quantiles for a do- 
main requires knowing the delay incurred by individual 
packets within that domain ||20l . Intuitively, this makes 
sense: An extreme example of a delay quantile is the max- 
imum delay incurred by a packet aggregate within X. Un- 
like average delay, maximum delay cannot be computed 
without collecting per-packet information at the entrance 
and exit of X. 

Now let's assume that there is packet reordering within 
domain X. In this case, the receipt collector cannot even 
compute the loss and average delay incurred within X, 
because there is no guarantee that HOPs 4 and 5 will break 
observed traffic into the same aggregates. 

3.4 Recap 

A simple protocol (like the strawman), where each do- 
main produces receipts for each packet it receives and de- 
livers, provides sufficient information for computing and 
verifying each domain's loss/delay performance; how- 
ever, the amount of resources required to store, process, 
and report per-packet state is (significantly) more than a 
typical domain can afford today. An aggregation-based 
protocol (Uke Difference Aggregator -H-), where each do- 
main produces per-aggregate receipts, introduces tunable 
cost, but is susceptible to packet reordering and does 
not provide sufficient information for estimating delay 
quantiles — only for computing loss and estimating aver- 
age delay. Finally, a sampling-based protocol (like Trajec- 
tory Sampling ++), where each domain produces receipts 
for sampled packets, does provide sufficient information 
for estimating loss and delay quantiles and introduces tun- 
able cost, yet is susceptible to sampling bias. 

3.5 VPM Overview 

VPM employs both sampling and aggregation — sampling 
to provide probabilistic delay-quantile measurements and 
aggregation to provide precise loss measurements. 

VPM's sampling component shares elements with Tra- 
jectory Sampling ++ (HOPs produce receipts for a subset 
of observed packets and choose which packets to sam- 
ple using hash functions), but prevents sampling bias in 
the following way. The sampling function is keyed using 
future traffic, making the samples unpredictable. Specif- 
ically, a domain does not know whether it will have to 
report measurements on a particular packet until after it 
has forwarded that packet to its downstream neighbor As 
a result, an unscrupulous domain has no way to decide 
whether to "sugarcoat" its performance by preferentially 
treating particular packets. 



VPM's aggregation component shares elements with 
Difference Aggregator ++ (HOPs produce receipts for 
packet aggregates and choose where to break each aggre- 
gate using hash functions), but provides accurate statistics 
in the face of packet reordering. This is achieved by pro- 
viding, on top of per-aggregate receipts, extra per-packet 
information for a small window around the cutting points 
between packet aggregates. 

One could ask, why use both sampling and aggrega- 
tion? After all, using sampling we can estimate both loss 
and delay quantiles (provided we fix the sample bias is- 
sue), so why use aggregation at all? One reason is that 
aggregation provides precise (as opposed to probabilistic) 
loss measurements and, as we will see, once we have de- 
ployed the sampling component, the incremental cost of 
adding the aggregation component is trivial. Another rea- 
son is to add extensibility to our mechanism. Even though 
we do not consider this scenario in this paper, "bad" ISP 
behavior may consist not only of introducing loss and un- 
predictable delay, but also of modifying traffic; the only 
way to detect such behavior is to use a content-processing 
technique like the one proposed in |[T2| . which could be 
easily incorporated in our aggregation component, but not 
in a sampling-only mechanism. 

4 Voluntary Reporting 

In this section, we describe what kind of information 
VPM domains produce and how that information is used 
to estimate and verify their performance. We do not worry 
about how this information is generated — we defer that to 
the next two sections. 

Traffic Receipts Each VPM HOP generates receipts for 
the traffic it observes. There are two kinds of receipts: 

1 . A receipt for a set of sampled packets has form 

TZ ~ {PathID, Samples). 

2. A receipt for a packet aggregate has form 
n = {PathID, AggID, PktCnt). 

PathID specifies the HOP path to 
which the corresponding sampled packets 
or packet aggregate belongs. It has form 

{HeaderSpec, PreviousHOP, NextHOP, MaxDiff). 
HeaderSpec specifies which part of a packet's headers is 
used to identify the packet's path; it includes at least a 
source and destination origin-prefix pair. PreviousHOP 
and NextHOP specify the previous and next HOPs on 
this path. MaxDijJ is a value agreed upon between the 
reporting HOP and the HOP that is at the other end of the 
same inter-domain link (e.g., HOPs 3 and 4 in Figure [T]). 
It is meant to lower-bound the difference in timestamps 
one should expect between the two HOPs. 

Samples is a sequence of {PktID, Time) records, each 
corresponding to a single sampled measurement. The 



packet identifier PktID is a digest of the packet's head- 
ers. Time specifies when the corresponding packet was 
observed at the HOP. The aggregate identifier AggID con- 
sists of the packet IDs of the first and last packets of the 
aggregate. PktCnt is the number of packets observed by 
the HOP within this aggregate. 

Upon receiving a packet, each HOP classifies it into 
a HOP path and an aggregate, counts it against that ag- 
gregate's packet count, and decides whether to sample it. 
Periodically, the HOP generates traffic receipts for all the 
sampled packets and aggregates it has observed since the 
last reporting time, which it disseminates to all domains 
that observed the corresponding traffic. 

Receipt-based Statistics Consider HOPs 4 and 5 in 

Figure [T] and suppose we collect all their receipts. We 
now describe the types of statistics we can compute from 
these receipts. 

Suppose HOPs 4 and 5 use the same sampling algo- 
rithm, i.e., if one HOP samples a packet p, the other HOP 
also samples p (provided p is not lost before reaching the 
HOP). If the two HOPs generate for p receipts 7?.4 and 
7^5, respectively, then the packet's delay through X was 
TZ^. Time — TZ^. Time. By computing the delay experi- 
enced by the sampled packets within X, we can estimate 
upper and lower bounds for the delay experienced by all 
packets within X ll20l . 

Now suppose HOPs 4 and 5 use the same aggregation 
algorithm. If the two HOPs generate for the same packet 
aggregate a receipts TZ^ and TJ" , respectively, then X lost 
n^.PktCnt - n^. PktCnt packets of the aggregate. 

Receipt Combination Receipts of either kind can be 
combined with others from the same HOP to generate re- 
ceipts of a larger sample set or coarser aggregate. For 
sampling receipts combination is straightforward; 

l±)^7^^ = / PathID,[J Samples i \ 

For aggregate receipts, consider N consecutive aggre- 
gates, ai,i = l..A^, from the same path, and the N 
receipts, 7^"' = {PathID, AggID ^, PktCnt,), produced 
for these aggregates by a single HOP. We define the com- 
bination of these receipts as 

W,7e, = / PathID, AggID, ^ PktCnt, \ 

where AggID is the identifier (first and last packet digest) 
of the union of all N aggregates. 

Receipt Consistency Consider two receipts, 7?,| and 
7?.g, for the same sampled packet p, produced by two 
HOPs on opposite ends of the same inter-domain link 
(e.g., HOPs 5 and 6, in Figure [TJ. The two receipts are 



considered consistent with each other when all of the fol- 
lowing hold: 

TZl.PatfilD.MaxDiff = TZl.PathlD.MaxDijJil) 
< TZl.PathlD.MaxDijJil) 



TZg. Time 



7^1 . Time 



These rules express the fact that a correct inter-domain 
link does not introduce unpredictable delay: the time at 
which a sampled packet is delivered by one HOP and re- 
ceived by the other should differ at most by a predictable 
MaxDiff, set during configuration of that link by the two 
involved domains. 

Now consider two receipts, 7?.g and TZ^, for the same 
packet aggregate a, produced by two HOPs on opposite 
ends of the same inter-domain link. The two receipts are 
considered consistent with each other when: 

TZs. PktCnt = TZq. PktCnt 

This rule represents the fact that a correct inter-domain 
link does not introduce packet loss — hence, the number of 
packets delivered by one HOP and received by the other 
should be the same. 

If a receipt collector gets inconsistent receipts from 
two neighbors, it discards both receipts and notifies both 
neighbors of the inconsistency, such that the liar is ex- 
posed to the neighbor it implicated, as in the strawman 

(ED. 

(No) Clock Synchronization VPM does not require 
that HOPs have synchronized clocks. However, it is to 
a participating domain's best interest to keep its HOPs 
reasonably synchronized (e.g., at the granularity of a mil- 
lisecond, achievable with NTP ||5l), since its delay perfor- 
mance will be estimated based on the timestamps reported 
by different HOPs. Moreover, it is to two neighboring do- 
mains' best interest to keep adjacent HOPs (like 3 and 
4 in Figure [T]) reasonably synchronized, otherwise their 
timestamp difference will exceed the reported MaxDiff 
and the two neighbors will generate inconsistent receipts 
(hence appear to have a problematic inter-domain link or 
be involved in a lie). 

We should note that domains are free to report arbitrar- 
ily large MaxDiff values: nothing prevents HOPs 3 and 
4 from keeping de-synchronized clocks and reporting a 
MaxDiff of several seconds between them. That, how- 
ever, does make it look like they are connected through 
an awfully slow inter-domain link — not a good feature to 
advertise to their customers and peers. 

5 Bias-resistant, Tunable Sampling 

We now describe how each HOP chooses which pack- 
ets to sample. Our sampling algorithm prevents domains 
from exaggerating their performance by biasing their sam- 
ples 05.1b . while it maximizes the number of packets that 



Algorithm 1 D day S ample {p, /i, a) 



Input 


P 


// new packet 


Input 


i" 


// marker threshold 


Input 


(T 


// sampling threshold 


Initially 


TempBuffer ^ 


// packet buffer 


Initially 


7^^0 


// current receipt 



1: it Digest (p) > /x then 

2: for all packets q in TempBuffer do 

3: if SaTnpleFcn{Digest{q), Digest{p)) > 

then 

4: Add {Digest{q), Tim,e{q)) to TZ.Sam,ples 

5: Empty TempBuffer 

6: Add {Digest{p), Time{p)) toTZ. Samples 

7: else 

8: Add Digest (p) to TempBuffer 



are commonly sampled by all HOPs that observe them, 
while allowing each HOP to choose its own sampling 
rate 05.2l i. even in the face of loss and packet reordering 

(E3. 

5.1 Bias Resistance 

Instead of sampling packets in real time, each HOP main- 
tains state on all observed packets, but only for a fixed, 
short period of time (ten milliseconds or so). After that 
period of time has elapsed, the HOP is told which of the 
stored per-packet state to keep and which to discard. Since 
an ISP learns whether a packet's fate will affect estimates 
of its performance only after it has forwarded that packet, 
it cannot treat sampled packets preferentially. 

A dishonest HOP could, in theory, store every single 
packet, wait to learn whether the packet has to be sam- 
pled, f/ien decide how to treat the packet. However, that 
means delaying all traffic at the HOP by ten milliseconds 
or so (an order of magnitude above the delay introduced 
by a correctly functional router) — not to mention that it re- 
quires buffering ten milliseconds' worth of traffic, which, 
for a lOGbps interface would require 25MB (i.e., several 
chips) of expensive SRAM storage. 

A key question is wlw tells each HOP which packets 
to delay-sample. A naive approach would be to use ex- 
plicit signaling; for example, in Figure [T] domain S could 
explicitly tell all HOPs in path P which packets to sam- 
ple from each aggregate sent from S to D along V. That, 
however, would essentially require every source domain 
to set up virtual circuits along all Internet paths that ob- 
serve its traffic. Instead, each HOP decides whether to 
delay-sample a packet based on the contents of another 
packet sent later on the same path. In this sense, domain S 
implicitly dictates which of its packets should be sampled, 
through the traffic it subsequently routes via V anyway. 

Algorithm[T]shows what happens when a HOP observes 



a new packet p from path P; the algorithm assumes that 
the HOP maintains a temporary buffer with per-packet 
state for all the packets observed from V. If the packet sat- 
isfies a certain condition, it is chosen as a "marker" packet 
(line 1). In that case, its contents determine which of the 
already observed packets to sample (lines 2-4) discarding 
the rest (line 5). The marker packet itself is also sampled 
(line 6). Observe that HOPs maintain state for all packets 
only during the short period of time until the next marker 
packet is observed. 

The marker value jjl, which determines which pack- 
ets are "markers," is a system-wide constant specified by 
VPM at design time; when there is no loss, all HOPs in 
V select the same packets as markers. In contrast, the 
sampling threshold a, which determines which packets 
are sampled, is a local parameter, chosen independently 
at each HOP. If all HOPs in V choose the same a, they 
all sample the same packets (modulo the packets that are 
lost). We turn next to what happens when different HOPs 
select different sampling thresholds. 

5.2 Tunability 

Each HOP chooses its own sampling rate. At the same 
time, given A^ HOPs observing the same packet sequence 
and their sampling rates, we maximize the number of 
packets that are commonly sampled by all HOPs. 

The key element that enables this property is the in- 
equality in line 3 of Algorithm [T] Consider HOPs 1 
and 2, with sampling thresholds cti and (72 < cfi- 
Suppose that p is a packet sampled by HOP 1 and q 
is the first marker packet observed after p by HOP 1. 
Since HOP 1 samples p, this necessarily means that 
SampleFcn(Digest(q)^ Digest{p)) > ai > ai, which 
means that HOP 2 also samples p; hence, HOP 2 samples 
at least all packets sampled by HOP 1. So, even though 
each HOP chooses its sampling rate independently, if 
there is no packet loss or reordering, different HOPs never 
sample partially overlapping packet sets. 

5.3 Sampling Under Loss and Reordering 

Loss and reordering decrease the number of commonly 
sampled packets. E.g., if a marker packet get lost be- 
tween two HOPs, it causes them to sample arbitrarily 
different packet sets for several milliseconds — until the 
next marker arrives. The good news is that it takes un- 
Ukely amounts of (non-purposeful) loss/reordering to sig- 
nificantly impact the estimation accuracy of the mecha- 
nism. For instance, in Section |2l we show that, if HOPs 
4 and 5 sample 1% of the observed traffic, and the link 
between them experiences 25% packet loss, the delay be- 
tween the two HOPs can still be estimated with an accu- 
racy of 2msec. This accuracy is sufficient for verifying 
today's SLAs, which typically promise intra-domain de- 
lays on the order of multiple tens of milliseconds U. 



An under-performing domain (say X in Figure[T]i could 
drop all marker packets, causing the next domain (N in 
our example) to sample all the wrong packets; this would 
ensure that X's performance is never verified according 
to A^'s receipts. First, note that such behavior from X 
is detrimental to N (because it prevents it from produc- 
ing correct receipts), hence A^ has a clear incentive to ex- 
pose and stop it. Second, such behavior is bound to be 
exposed, because marker packets are expected to be al- 
ways sampled and reported on: if X drops a marker q, it 
either has to admit dropping it or lie and be inconsistent 
with A^'s claim that it never received q; either way, if X 
consistently drops markers, it is either globally exposed 
as misbehaving or locally exposed as such to A^. 

6 Tunable Aggregation 

We now describe how each HOP chooses which pack- 
ets to assign to the same aggregate. Like our sampling, 
our aggregation is "tunable," i.e., we allow each HOP to 
choose its own degree of aggregation, according to the lo- 
cally available resources. This raises the following chal- 
lenge: when HOPs aggregate differently, they produce 
receipts on different aggregates; how can one combine 
such receipts to estimate domain performance and per- 
form consistency checking? We first describe this chal- 
lenge in more detail ( ^6.11 ). then present our solution in 
two parts — first assuming no loss or reordering 06.2I ). 
then removing this assumption 06.31 ). 



-^1 = {{Pl}. {Pi}, {Ps}, {P4}} 
A = {{P1,P2},{P3,P4}} > ^1 
-43 = {{pi},{P2,P:i}APi}} > -4i 
-43 = {{Pl},{P2},{P3,P4}} >A2 
-44 = {{Pl,P2,P3,P4}} > -42,-43 



Join{Ai,A2) — A2 
Join{A2,A'j,) — Ai 
Join{A2,A!z) — A2 



Terminology and Notation: We borrow the follow- 
ing terminology and notation from set theory (illustrated 
through the examples of Table [ill: 

1 . A partition of a packet set 5 is a set of non-overlapping 
aggregates whose union is equal to S. Given a par- 
tition A of some packet set, each packet that is the 
first packet of an aggregate in A is called a cutting 
point. For example, pi and p3 are cutting points in 

-4= {{P1,P2},{P3,P4}}- 

2. Suppose Ai and A2 are partitions of the same packet 
set. We say that Ai is coarser than A2 (or A2 is finer 
than Ai), denoted by Ai > A2, when each aggregate 
in Ai is a union of aggregates in A2- 

More formally, we say that Ai > A2, when: 
3{p,\P,eA2}■.\J,P^ = a,yaeAl. 

3. Suppose Ai,i — l..A^, is a partition of packet set S. 
We say that J7 is the join of Ai,A2, ■■■An, denoted 
by JT" = Join{Ai,A2, ■■■An), when J is the finest 
partition of S that is coarser than all Ai^ 

More formally, we say that 

J = Join{Ai,A2, ■■■An), when: 

J <j' yj' ■■ J' >A^y^, 

where J'' is also a partition of <S. 



Table 1 : Different partitions of packet set <S — {pi , p2 , Ps , P4} 
and some join examples. Note that not all partitions of <S have 
a ">" relationship, e.g., we cannot say that A2 > .43 nor that 
A'i > .42. 



6.1 The Partitioning Problem 

If we view all traffic sent on path T' as a packet set S, 
then we can say that each HOP in V that performs packet 
aggregation computes a partition of 5. 

When two HOPs produce different aggregate sets from 
the same packet set, a domain that collects their re- 
ceipts cannot directly perform consistency checking as de- 
scribed in Section m However, it can try to find traffic 
receipts from one HOP that, when combined, exactly cor- 
respond to traffic receipts (and aggregates) from the other 
HOP, and then proceed with the calculations and verifica- 
tion from Section |4] This corresponds to computing the 
join of the two aggregate sets as defined above to find the 
finest aggregate set over which statistics can be computed 
across the receipts from the two HOPs. 

For instance, suppose two HOPs observe packet set S 
from Table[T]and, respectively, produce aggregate sets A2 
and .43 (from the same table). A domain that collects their 
receipts can combine each HOP's receipts and produce the 
receipt that the HOP would have produced for the (single) 
aggregate in aggregate set A^^ So, the two HOPs' claims 
can be checked for consistency only with respect to the 
aggregates in the coarser aggregate set Join{A2, A3) = 

yl4. 

Although this approach is general — there is always a 
join of two aggregate sets over which a verifier can com- 
pute some combined receipts and, therefore, some per- 
formance statistics — the quality of the results varies. In- 
tuitively, we would want the join of fine-grained aggre- 
gate sets to be just as fine-grained; otherwise informa- 
tion obtained and forwarded at high resource cost would 
end up lost in translation. In the example above, the join 
of A2 and .43 is .44, a single-aggregate aggregate set, 
even though the input aggregate sets and traffic receipts 
afforded multiple data points each from either HOP. In 
contrast, an equally "expensive" aggregate set A'^ from 
the second HOP, would have allowed the verifier to com- 
pare receipts on Join{A2, A^) ~ A2, which conserves all 
information from the first HOP and only combines two of 
the three receipts from the second one. 

Our goal then is: to design a partitioning algorithm that 
results in the finest possible join given the rate at which 
each HOP can produce new aggregates^ 



Algorithm 2 Partition{p, 5) 



Input p 

Input S 

Initially 7^ = 



// new packet 

// partition threshold 

// current receipt 



if Dig est (p) > 6 then 

Close receipt TZ for aggregate TZ.AggID 
Open new receipt 7?. ^— 
n.AgglD.FirstPacketID ^ p 

TZ.AggID. LastPacketID ^ p 

n.PktCnt ^ n.PktCnt + 1 



6.2 Basic Solution 

At a high level, VPM limits domains' choice of packet ag- 
gregation so as to produce "good" aggregates with respect 
to join and combination, while allowing them to tune how 
fine their choice is. 

Algorithm|2]shows what happens when a HOP observes 
a new packet p from path V; the algorithm assumes that 
the HOP maintains one "open" receipt per path. If the 
packet's contents satisfy a certain condition (line 1), then 
the current aggregate for path V is closed (line 2) and the 
packet is classified in a new aggregate (line 4); otherwise, 
the packet is classified in the current aggregate (Une 5). 
Observe that this algorithm requires constant state per ag- 
gregate and constant computation per packet (i.e., its state 
size and per-packet computation are not proportional to 
aggregate size). 

Algorithm |2] ensures that HOP 2 with partition 
threshold 62 will partition a stream at least at the same 
points as HOP 1 with partition threshold Si > 62. 
So, even though each HOP chooses its partitioning 
rate independently, if there is no loss or reordering, 
different HOPs never produce partially overlapping 
aggregate sets. For instance, if HOPs 1 and 2 from 
Figure [T] observe packet sequence {pi,p2, ...ps) and 
have partition thresholds 5i > 62, they may respectively 
produce aggregate sets {{pi,P2,P3,P4}AP5,P6,P7,Ps}} 
and {{Pl,P2}, {P3,P4}, {P5,P6}, {pt,P8}}, 

but not {{pi,P2,P3,P4},{P5,P6,P7,P8}} and 

{{Pl}, {152,^3}, {P4,P5}, {P6,P7}, {Ps}}- 

6.3 Partitioning Under Loss and Reordering 

Loss can decrease the fine-ness of the join of the pro- 
duced aggregate sets: Suppose HOPs 1 and 2 pro- 
duce aggregate sets {{pi,P2,P3,P4}AP5,P6,P7,P8}} 
and {{pi,P2},{P3,P4:},{P5,Pe},{P7,P8}}; the join of 
the two sets is {{pi,P2,P3,P4}AP5,P6,P7,P8}} (the 
coarsest of the two aggregate sets). However, if p^ is 
lost before HOP 2, then the latter produces aggregate 
set {{pi,P2}, {P3,P4,P5,P6}, {P7,P8}}; now, the join of 
the two sets is {{pi,P2,P3,P4,P5,P6,P7,P8}} (the worst 
possible in this example). So, loss can cause a combina- 



tion of aggregates that would otherwise have been split 
using the lost packet as a cutting point, which, in turn, 
reduces the fine-ness of the join. 

The good news is that, although loss does decrease the 
fine-ness of the resulting join, the degradation is smooth, 
because the probability of coarsening the granularity of a 
measurement is conditioned on a cutting point being lost, 
not on arbitrary packet loss and, even then, not all cut- 
ting points can cause a violation of the total order when 
lost. For instance, in Section [T] we show that, if HOPs 
4 and 5 generate an aggregate receipt for every 100, 000 
packets, and the link between them experiences 25% loss, 
the loss between the two HOPs can still be computed 
for every 150, 000 packets, on average. Note that being 
able to compute domain loss at such granularity is more 
than sufficient for verifying today's SLAs, which typically 
promise a certain level of packet loss per month (a dura- 
tion that corresponds to billions of packets, assuming a 
traffic rate of a few tens of Mbps along each path) H] . 

Reordering can also decrease the fine-ness of the 
join of the produced aggregate sets: Consider path 
V from Figure [T] and original packet sequence S = 
{pi,P2, .■.P8) sent along V. Suppose HOP 1 observes 
this sequence and partitions it into aggregate set A = 
{{Pi, P2, P3, P4} , {P5, P6, P7 , P8}}- HOP 4 observes se- 
quence {pi,p2,P3,P5,P4,P6,P7,P8) due to reordering 
somewhere between the two HOPs. Even though it uses 
the same algorithm, it partitions the sequence into A' = 
{{Pi,P2,P3}AP^^Pi^P(i,P7,P8}}- The two aggregate 
sets are not ordered according to the "finer than" relation, 
so their join is the entire sequence, an undesirable effect 
of reordering. 

In practice, packets are reordered only when they are 
transmitted close to one another (according to the most 
recent Internet-wide experiment we are aware of, packets 
transmitted more than half a millisecond apart were not re- 
ordered ifTol ). Hence, we define, for each path V, a safety 
inter-arrival threshold J and assume that two packets that 
follow P can be reordered only if they are observed (at 
any HOP) less than J time units away from one another 
This assumption allows us to bound the coarseness of the 
join at the cost of keeping extra per-aggregate state. 

At a high level, we alter the mechanism of Algorithm|2] 
to add patch up information in every receipt. A verifier 
can use this patch up information to make "misaligned" 
receipts from different HOPs align better, thereby en- 
abling a better join of the corresponding aggregate sets 
and consequently better-quality traffic statistics. 

More specifically, a traffic receipt for a packet aggre- 
gate also specifies the sequence of packets observed J 
time units around the cutting point. In the above ex- 
ample, HOP 1 reports sequence {p3,P4,P5,P6) in its re- 
ceipt for the first aggregate, and HOP 4 reports sequence 
{P2 J P3 , P5 , P4) in its receipt for the first aggregate. In gen- 
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eral, a receipt is extended from the earlier definition to be 
{PathID, AggID, PktCnt, AggTrans), where AggTrans 
is the sequence of packet identifiers that correspond to the 
packets observed within a window of 2 J from the aggre- 
gate's last packet. 

Using this information, the verifier can transform one 
hop's receipts to match what the HOP would have gen- 
erated, had it observed the same packet sequence with an- 
other HOP. In our particular example, HOP 1 reports ob- 
serving packet p4 before cutting point p5, while HOP 4 
reports observing it after the cutting point. Consequently, 
the verifier would transform HOP 4's receipts by "migrat- 
ing" p4 from the later to the earlier aggregate (i.e., decre- 
menting the packet count of the former and incrementing 
the packet count of the latter). With this transformation, 
HOP 4's receipts correspond to the same aggregates with 
HOP 1 's receipts, hence the verifier can proceed with the 
performance computation and verification of Section|4] 

If adding per-packet state to aggregate receipts sounds 
like too much overhead, take into account that a HOP is 
supposed to choose how many packets to assign to each 
aggregate according to its resources. E.g., a HOP may 
choose to cover minutes' worth of traffic with each aggre- 
gate; in this case, including in each per-aggregate receipt 
per-packet state for the few packets observed around the 
end of the aggregate is significantly less expensive than 
maintaining per-packet state. We quantify this per-packet 
overhead in Section |7] 

7 Evaluation 

We now compute the resource overhead incurred by VPM 
domains, and quantify the quality with which each do- 
main's performance is estimated. 

We consider the case where HOP functionality is im- 
plemented in border routers, as part of a NetFlow-like 
monitoring platform that operates partly in the router's 
data-plane and partly in its control plane. The data- 
plane part handles per-packet operations and collects per- 
aggregate state in a monitoring cache; we refer to it as 
the collector module. The control-plane part periodically 
reads the state from the data-plane and performs further 
processing; we refer to it as the processor module. 

As a proof of concept, we implemented the collector 
and processor modules in Click (although, in a real router, 
the former would be implemented in hardware, close to 
the router's forwarding plane, e.g., as part of a NetFlow 
engine). Our implementation uses the "Bob" hash func- 
tion (because it has been shown to work well with Inter- 
net traffic fT9l) to compute packet digests and applies it 
to each packet's IP and transport headers. The collec- 
tor's monitoring cache is updated from traffic traces (as 
opposed to actual network traffic). We used traces from a 
Tier-1 ISP, provided by CAIDA. 



7.1 Overhead 

Memory and Processing The amount of memory and 
processing resources needed for the processor module is 
tunable. The processing module reads receipts from the 
monitoring cache and prepares them for storage or dis- 
semination. The rate at which new receipts appear in the 
monitoring cache (hence need to be read and processed) 
depends directly on the locally chosen sampling and par- 
tition thresholds. Hence, a domain can directly control 
the amount of memory and processing cycles spent by 
the processing module by varying these two thresholds (a 
demonstration of the resulting trade-off follows). 

The collector module maintains state for each "active 
path," i.e., each source-destination origin-prefix pair that 
is currently sending traffic through the specific HOP; this 
per-path state consists at least of one "open" aggregate 
receipt (a PathID, AggID, and PktCnt — roughly 20 
bytes). E.g., if a HOP observes traffic from 100, 000 paths 
at the same time, it needs a 2MB monitoring cache. 

Moreover, the collector module maintains a temporary 
packet buffer, where it stores {PktID, Time) pairs (4 and 
3 bytes, respectively) for all packets observed within J 
time units. At first, this seems to be cause for concern — 
what happens with high-rate paths that observe millions 
of packets per second? In reality, however, the per-packet 
state that needs to be kept is modest: Recall that J is our 
"safety threshold" — when two packets are observed more 
than J time units apart, we assume that they cannot be re- 
ordered. A conservative choice is to set J to 10msec — an 
order of magnitude above the millisecond threshold that 
we need according to the latest Internet reordering mea- 
surements we are aware of ifTOl . An OC-192 interface 
observes at most lOGbps. If we assume an average packet 
size of 400B, lOGbps corresponds to 3.125Mpps per di- 
rection, which means that a HOP would need a 436KB 
temporary buffer for each lOGbps interface. Assuming an 
(implausible) worst-case traffic of all minimum-size pack- 
ets, lOGbps correspond to 20Mpps per direction, which 
means that a HOP would need a 2.8MB temporary buffer 
for each lOGbps interface. So, even assuming worst-case 
traffic, the amount of buffering we need fits into a single 
SRAM chip. 

Finally, for each packet p, the collector looks up the 
packet's PathID; computes Digest (p) and a timestamp; 
updates the corresponding PktCnt; and stores the di- 
gest and timestamp to the temporary packet buffer This 
amounts to three memory accesses, one hash function, and 
one timestamp computation per packet. Moreover, when- 
ever a marker packet is observed, the HOP goes through 
the temporary packet buffer and discards state for the 
packets that are not delay-sampled, which adds one more 
memory access per packet. Such processing, though not 
currently supported by routers, is within the capabilities 
of modern hardware and in line with the guidehnes set by 
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the IETF Packet Sampling group ||6l. 

Bandwidth We have said that each domain makes each 
receipt available to every other domain that observed the 
corresponding traffic. Whether this happens pro-actively 
(through a constant receipt stream) or on-demand (e.g., 
through a secure web interface), receipt dissemination in- 
troduces, in each path, bandwidth overhead that depends 
on (1) the number of HOPs on that path and (2) the rate at 
which each of these HOPs produces receipts. 

Again, this seems, at first, to be cause for concern — 
one could argue that introducing bandwidth overhead that 
grows with the total number of HOPs per path is not a 
"scalable" approach. In practice, this dependence on the 
number of HOPs is not a problem: Paths consist on aver- 
age of 3-4 domains, hence 4-6 HOPs (check the "Average 
AS path length" and "Average address weighted AS path 
length" entries in ||2l). To be conservative, we consider 
a 10-domain path, where each HOP puts on average an 
ambitious 1000 packets per aggregate and samples 1% of 
the path's packets. Given receipt size (22 bytes), this path 
will incur an overhead of 0.2 bytes per packet; assuming 
400 bytes per packet, this leads to a 0.046% bandwidth 
overhead for the path. 

Click Implementation As a proof of concept, we con- 
figured an eight-core Intel Nehalem server as a standard 
IPv4 router and fed to it a real trace. Then we mea- 
sured the router's performance with and without our VPM 
modules loaded and saw no difference (in both cases, 
the server routed 25Gbps). This is not surprising, given 
that, when fed realistic traffic, a Nehalem server is bottle- 
necked at the I/O, whereas our VPM modules burden the 
CPU. 

7.2 Quality 

Methodology We consider the case where domain X 
from Figure [T] is congested, and X's delay performance 
is estimated from its receipts. Each experiment consists 
of: (1) extracting a packet sequence S from one of our 
traces and consider the case where S is sent through do- 
main X; (2) simulating a scenario where the intra-domain 
path between HOPs 4 and 5 is congested; (3) generating 
the receipts that X would generate for packet sequence 
S; (4) estimating X's performance as a verifier would es- 
timate it based on X's receipts, i.e., using the technique 
from ||20l . (5) comparing that to X's actual performance. 
For step 1, we use traces provided by CAIDA, collected 
in 2008 from a Tier-1 ISP. When we say that we "extract a 
packet sequence" from a trace, we mean that we extract all 
packets that carry a given source and destination origin- 
prefix pair The point of using real traces is to verify that 
our sampling and aggregation algorithms work well given 
an actual packet stream — e.g., when a domain chooses its 
sampling threshold so as to sample 1% of the observed 
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Figure 2: The accuracy with which domain X's delay perfor- 
mance is estimated as a function of X's sampling rate, for dif- 
ferent levels of loss, when X uses our sampling algorithm. Con- 
gestion is caused by a bursty, high-rate UDP flow. 

traffic, it indeed samples 1%. The results we show corre- 
spond to a particular packet sequence (of 100, 000 packets 
per second), but all traces and packet sequences we tried 
gave us consistent results. 

For step 2, we "introduce" loss and delay in the cho- 
sen packet sequence. To introduce loss, we discard a 
subset of the packets, chosen using the Gilbert-Elliot loss 
model Q. Introducing delay is more complicated, as we 
are not aware of any commonly acceptable delay model 
for Internet traffic. Instead, we use the NS simulator to 
create realistic congestion scenarios, and generate the se- 
quence of delay values that our packet sequence would 
encounter in each case. We consider different congestion 
scenarios, where long-lived TCP or UDP flows compete 
for/saturate the bandwidth of a bottleneck link, but show 
results only for the scenario that introduced the highest 
delay variance in the shortest time scale. 

Accuracy of Estimated Delay By reducing its sam- 
pling rate, a VPM domain can reduce the amount of re- 
sources it spends sampling, at the cost of its delay per- 
formance being estimated with lower accuracy. We now 
examine this trade-off. 

We run a set of experiments where we vary domain X's 
sampling rate. Figure |2] (consider the "No loss" curve) 
shows the accuracy with which X's delay performance 
is estimated, as a function of the sampling rate. We see 
that, reducing the sampling rate results in smooth accu- 
racy degradation. Even if X samples only 0.1% of the 
observed traffic, its delay performance is estimated with 
sub-millisecond accuracy. 

Next, we examine how packet loss affects our sampling 
algorithm, hence the accuracy with which a VPM do- 
main's delay performance is estimated. We run a set of 
experiments where we vary both X's sampling rate and 
the amount of packet loss introduced by X. Figure |2] 
shows how accuracy degrades with lower sampling rate, 
for different loss values. We see that, when X samples 
1% of the observed traffic and 25% of this traffic is lost 
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Figure 3: The granularity at which domain X's loss perfor- 
mance is computed as a function of the loss rate introduced by 
X, when X uses our aggregation algorithm. 

within X, X's delay performance is still estimated with 
an accuracy of 2 msec. This robustness in the face of loss 
is partly due to our sampling algorithm and partly owed 
to the estimation algorithm from ||20| (which works well 
even with few samples). 

Granularity of Computed Loss We now examine how 
packet loss affects the granularity at which a VPM do- 
main's loss performance is computed. 

We run a set of experiments where we fix X's aggre- 
gation rate (such that it produces one aggregate every 
100, 000 packets) and vary the amount of packet loss in- 
troduced by X. Figure [2] shows the granularity at which 
X's loss performance can be computed, as a function 
of the loss rate. We see that, when there is no loss, 
X's loss performance can be computed over Isec peri- 
ods (because X produces a new aggregate every 100, 000 
packets, which, for the particular packet sequence we are 
considering, corresponds to 1 sec). As the level of loss 
increases, granularity worsens — i.e., a verifier that col- 
lects X's receipts cannot always compute X's loss per- 
formance over Isec periods. However, the degradation 
is, again, smooth: even if X loses 25% of the observed 
traffic, its loss performance is computable over periods 
of 1.5sec. This robustness in the face of loss is due to 
our aggregation algorithm, which maximizes the number 
of common aggregates across HOPs — essentially enables 
HOPs not to fall "out of sync" when packets get lost. 

Veriflability We have demonstrated that a VPM do- 
main's loss and delay performance can be accurately es- 
timated from its receipts, even when the domain samples 
1% of the observed traffic, puts hundreds of thousands of 
packets into a single aggregate, and is severely congested 
(to the point of losing more than 25% of the observed 
traffic). The next question is, can such a domain's per- 
formance also be verified with this same quality, i.e, will 
the domain be caught if it lies? 

The answer depends, of course, on how many resources 
the domain's neighbors devote to sampling and aggrega- 



tion. Suppose, for instance, that domain L from Figure [T] 
collects receipts from X and N. Figure|2]gives some con- 
crete numbers: If X samples at 1% and loses 25% of the 
observed traffic, L can estimate X's delay performance 
with accuracy 2msec. If N samples at the same rate, L 
can also verify X's performance with the same accuracy. 
However, if A^ samples at 0.1%, then L can only verify 
X's delay performance with accuracy 5msec. 

To summarize, a VPM domain's choice of sampling 
and aggregation rate determines, first, with what quality 
its own performance can be estimated by its customers 
and peers; second, to what extent its receipts can be used 
to verify the performance of its neighbors. 

8 Discussion and Related Work 

Partial Deployment If domain X in path V has not de- 
ployed VPM, but its neighbors have, then X's neighbors 
are free to blame their performance problems on X (since 
X does not produce any receipts to refute their claims). 
We view this as an incentive for deployment: a domain 
has to report on its performance in order to prevent its 
neighbors from blaming their problems on it. Conversely, 
if X is the only domain in P that has deployed VPM, its 
performance reports may not be verified by its neighbors, 
but they are still veiifiable. So, during a congestion in- 
cident, X can still position itself as the "good" ISP that 
provides troubleshooting information to its customers — it 
is not its fault that the other ISPs on the path are not up to 
the task. X can even use this as an incentive to encour- 
age multi-network customers to connect all their networks 
through X — since that way they avoid domains that do 
not provide troubleshooting information. 

Related Work The Packet Obituaries protocol ID and 
the fault-localization protocols from |fTT| inform traffic 
sources where individual packets get lost or corrupted. 
Audit provides source domains with similar per-TCP- 
flow information IH. VPM is similar to these protocols 
in that it relies on in-path elements collecting and ex- 
porting traffic statistics; it also borrows the concept of 
report consistency from Audit. VPM's novel elements 
are delay-sampling and tunable reporting; based on these 
techniques, it avoids the overheads necessary for collect- 
ing and propagating per-packet or per-flow state, while 
maintaining the verifiability property. 

In Trajectory Sampling, routers within an ISP sample 
packets using a hash function and record their digests, 
with the purpose of inferring the internal paths (sequences 
of routers) followed by packets IS). The Lossy Differ- 
ence Aggregator enables two monitoring points to mea- 
sure the loss and average delay between them by main- 
taining packet counts and average timestamps for packet 
aggregates flSl. We use ideas from both protocols (hash- 
based sampling, per-aggregate counts), but, as explained 
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in Section [3] none of them could provide the computabil- 
ity and verifiability properties necessary in our context. 

The "Secure SampHng" technique from lfT2l is useful 
when two entities, say Alice and Bob, want to measure 
the delay of the path between them by considering only a 
sample of the packets they exchange. To prevent interme- 
diate nodes from treating the samples preferentially, Alice 
and Bob agree on which packets to sample in such a way 
that the intermediate nodes cannot guess which are the 
samples. This technique is clearly not applicable to our 
problem: we are not looking to hide the samples from the 
intermediate nodes, we are looking to force the intermedi- 
ate nodes to sample honestly — in our context, the entities 
that perform the sampling (the domains) are precisely the 
ones that may bias the samples. 

The "Secure Sketch" technique from lfT2l enables Al- 
ice and Bob to detect when the packets they exchange 
are lost, delayed, or modified beyond a certain level. To 
this end, both Alice and Bob maintain a sketch (in some 
sense, a summary) of all the packets they have exchanged; 
at the end, Alice sends her sketch to Bob, who com- 
pares the sketches and detects whether any of the above 
problems occurred. This technique is related to VPM in 
the same way with the Lossy Difference Aggregator: we 
could combine it with the strawman to build a mechanism 
that determines whether each domain modified packets 
beyond a certain level; however, it would not enable the 
estimation of delay quantiles. 

Finally, VPM can be viewed as a "performance ac- 
countability mechanism," which holds domains account- 
able for their performance. An economic analysis has 
showed that such a performance accountability mecha- 
nism would foster ISP competition and innovation lfT6ll . 

9 Conclusions 

We have presented VPM, a system by which network do- 
mains can estimate and verify each other's loss and delay 
performance. VPM relies on domains producing and ex- 
changing receipts for the traffic they receive and deliver 
A domain can estimate a neighbor's performance by pro- 
cessing the receipts produced by the neighbor; it can ver- 
ify that the neighbor's receipts are honest by comparing 
them to the receipts produced by other domains for the 
same traffic. If a domain lies about its performance, that 
leads to receipt inconsistencies and exposes the liar to its 
neighbors. VPM comes at the cost of deploying (modest) 
new functionality at domain boundaries. The processing, 
memory, and bandwidth overhead incurred by a deploy- 
ing domain is configurable and independently determined 
by the domain. 
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